Logistic Regression

We want the hypothesis $h_{\theta}(x)$ fit in the range $0 \leq h_{\theta}(x) \leq 1$,

$$ \begin{gathered} h_{\theta}(x)=g(\theta^{T} x) \\ g(z)=\frac{1}{1+e^{-z}}, \end{gathered} $$

where $g$ is called "sigmoid" or "logistic" function.

Cost function

Forward pass

$$ \operatorname{Cost}\left(h_{\theta}(x), y\right)=\left\{\begin{aligned} -\log \left(h_{\theta}(x)\right) &, \text { if } y=1 \\ -\log \left(1-h_{\theta}(x)\right) &, \text { if } y=0 \end{aligned}\right., $$

which can be simplified as,

$$ \operatorname{Cost}\left(h_{\theta}(x), y\right)= -y \log \left(h_{\theta}(x)\right)-(1-y) \log \left(1-h_{\theta}(x)\right) $$

As a result, overall cost funciton can be written as follows,

$$ \begin{aligned} J(\theta) &=\frac{1}{m} \sum_{i=1}^{m} \operatorname{Cost}\left(h_{\theta}\left(x^{(i)}\right), y^{(i)}\right) \\ &=-\frac{1}{m}\left[\sum_{i=1}^{m} y^{(i)} \log h_{\theta}\left(x^{(i)}\right)+\left(1-y^{(i)}\right) \log \left(1-h_{\theta}\left(x^{(i)}\right)\right)\right] \end{aligned} $$

Backward pass (Gradient Descent)

To minimize $J(\theta)$, $$ \begin{aligned} \theta_{j} &:=\theta_{j}-\alpha \frac{\partial}{\partial \theta_{j}} J(\theta) \\ & = \theta_{j}-\alpha \sum_{i=1}^{m}\left(h_{\theta}\left(x^{(i)}\right)-y^{(i)}\right) x_{j}^{(i)}, \end{aligned} $$ (simultaneously update all $\theta_{j}$)

The gradient can be calculated in a vectorized fashion,

$$ \frac{\partial}{\partial \theta} J(\theta) =\frac{1}{m} \mathrm{X}^{\mathrm{T}}\left[h_{\theta}(x)-y\right] $$

Decision Boundary

In this example, our hypothesis composed by $\theta=\left[\begin{array}{c} \theta_0 \\ \theta_1 \\ \theta_2 \end{array}\right]$, where $ h_{\theta}(x)=g\left(\theta_{0}+\theta_{1} x_{1}+\theta_{2} x_{2}\right) $

The model predicts "$y=1$" if $\theta_0+\theta_1*x_1+\theta_2*x_2 \geq 0$, and predicts "$y=0$ if $\theta_0+\theta_1*x_1+\theta_2*x_2 < 0$